| |
description |
Rating the similarity of two or more text documents is an essential
task in information retrieval. For example, document similarity can
be used to rank search engine results, cluster documents according
to topics etc. A major challenge in calculating document similarity
originates from the fact that two documents can have the same topic
or even mean the same, while they use different wording to describe
the content. A sophisticated algorithm therefore will not directly
operate on the texts but will have to find a more abstract
representation that captures the texts' meaning. In this paper,
we propose a novel approach for calculating the similarity of text
documents. It builds on conceptual contexts that are derived from
content and structure of the Wikipedia hypertext corpus.
|
publisher |
Cancun, Mexico: IEEE Computer Society
|
type |
Text
|
| Article in Proceedings
|
source |
In: ICDS2009: Proceedings of the 3rd International Conference on
Digital Society, pp. 322-327
|
contributor |
IPVS, Anwendersoftware
|
subject |
Information Storage and Retrieval (CR H.3)
|
| Information Search and Retrieval (CR H.3.3)
|